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ABSTRACT 

The present paper represents a demonstration of how 
LISREL V can be used to investigate scale invariance (1) across time 
(its relationship to test-retest reliability), and (2) across groups. 
Five criteria were established to test scale invariance across time 
and four criteria were established to test scale invariance across 
groups. Using the Coopersmith Self-Esteem Inventory for Children, six 
models were developed to test the above criteria with covariance 
matrices obtained from the responses of 722 Black, White, and 
Hispanic elementary students. Results indicated that correlated 
uniquenesses existed across time and this produced an overestimate of 
the test-retest reliability. In addition, the construct of 
self-concept was shown to be invariant across the three ethnic 
groups. Thus, LISREL procedures appear to provide a useful technique 
for studying scale invariance both within and between subjects. 
(Author/PN) 
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MEASURING SCALE INVARIANCE BETWEEN AND WITHIN SUBJECTS 



Jerl Benson 
Denn f s Hocevar 
University of Southern Cal Ifornla 



Abstract 



The present paper represented a demonstration of how LISREL 
V can be used to Investigate scale Invaralnce a) across time 
and Its relationship to test-retest reliability and b) 
across groups. Five criteria were established to test scale 
Invarlance across time and four criteria were established to 
test scale Invarlance across groups. Using a well-known 
self-concept Instrument, six models were developed to test 
the above criteria using covarlance matrices obtained from 
the responses of 722 Black, White and Hispanic elementary 
students. Results Indicated that correlated uniquenesses 
existed across time and this produced an overestimate of the 
test-retest reliability* In addition, the construct of 
self-concept was shown to be Invariant across the three eth- 
nic groups. Thus, LiSREL procedures appear to provide a 
useful technique for studying scale Invarlance both within 
and between subjects. 
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Measuring Scale Invarlance Between and Within Subjects 

The purpose of the present paper was to describe several 
criteria for evaluating scale Invarlance and to provide a 
pedagogical exposition of how scale Invarlance can be quan- 
tified In terms of Joreskog and Sorbom's (1981) LISREL sche- 
ma. Scales can be Invariant In two distinct ways. First, a 
scale can be Invariant across time — this type o* Invarl- 
ance Is analogous to test-retest reliability. Second, sca- 
les can be Invariant across groups — this type of Invarl- 
ance Is similar to the concept of factorial Invarlance In 
the factor analytic literature. 

The major focus of the present study was Invarlance 
across time and Its relationship to test-retest reliability. 
According to Magnusson (1966) reliability can be defined as 
the 'correlation between two parallel tests 1 (p 62). Paral- 
lel tests can be defined as the same test given on two ocas- 
slons or two content-similar tests given on the same ocas- 
s I on . 

Reliability theory Is based upon the model presented by 
Spearman where the observed score for Individual J Is equal 
to their true score plus their error score as shown In for- 
mula 1 . 

X. = T. + E. . (1 ) 

3 3 3 

When different scores result for the same Individual 
based upon the two testings, the difference Is attributed to 
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chance or random error. The assumptions regarding these er- 
rors Indicates that over an Infinite number of testings an 
Individual's mean error score will be zero, the errors are 
thought not to correlate with the Individual's true score 
and the errors them selves are considered to be uncorrelated 
(Magnusson, 1966, p. 64). Using the above three assumptions, 
an Individual's observed score ( X. > Is thought to be a re- 
presentation of their true score (T. ). Thus, reliability 
estimates are calculated using the observed test score data 
and are Interpreted as the ratio of true score variance to 
the total observed test score variance. This Interpretation 
Is based upor the above assumptions regarding errors of me- 
aurement. Of particular Interest here Is the assumption 
that the errors themselves are not correlated for each Indi- 
vidual. However, In many testing situations the errors may 
Indeed be correlated. Maxwell (1968) Illustrated how corre- 
lated errors would effect Internal consistency estimates by 
using an ANOVA model that tested whether the covarlance bet-, 
ween Items was greater than zero. If the [tern covarlances 
were greater than zero, the Internal consistency estimate 
was considered biased and that the bias could produce an 
over or an underestimate. 

A second focus of present paper was to Illustrate how 
one could Investigate scale Invarlance across Independent 
groups. The tnvarlance of psychometric properties across 
Independent groups has received extensive attention In the 
factor analytic literature. Typical concerns arc iho Invar- 
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lance of factor structures, factor variances and covarlanc- 
es, and factor uniquenesses. Prior discussion of invarfance 
across groups using the LISREL procedure have been provided 
by Benson (1982), Benson, Hocevar and Cohen (1982), Joreskog 
(1971), McGaw and Joreskog (1971) and Sorbom (1974). In ad- 
dition, Werts, Rock, Linn and Joreskog (1976) have shown 
that It Is possible to test the equality of var I ance-covar I - 
ance matrices between and within subjects with tests of 
different lengths. 

Until recently statistical procedures were not available 
to test for correlated errors In the test-retest coefficient 
nor to test simultaneously for scale Invarlance across 
groups. With the development of model testing using linear 
structural relationships (LISREL) developed by Joreskog and 
Sorbom (1981) the tenablllty of the assumption of uncorre- 
cted errors of measurement across time can be tested as 
well as the stability of the scale across groups. Specifi- 
cally, LISREL V allows the testing of differences In factor 
structure, true score variance and correlated errors of mea- 
surement within and between groups across time. Thus, the 
major objective of the paper, while using data representing 
a substantive content area regarding the measurement of 
self-concept, was mainly a demonstration of how LISREL V can 
be utilized a) to answer questions regarding the Invarlance 
of measurements across time and b) to test scale Invarlance 
across groups. 
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Methodcl ogy 



The data represent the scores of elementary students In 
grades three to six from over 70 schools In a large urban 
school district. Matched scores were obtained for the stu- 
dents pre to post resulting In a sample of 722. The sample 
was composed of 395 White students, 213 Black students and 
114 Hispanic students; 505 were boys and 217 were girls. 



The Instrument used In the study was the Coopersmlth 
£j» If -Esteem ln^ulsry ±QZ £h±±SiL&ll, Form B. The Instrument 
contains 25 Items, eight are positively phrased and 17 are 
negatively phrased. The response format Is In a dfchotomous 
fashion - Mike me' or 'unlike me'. Form B of the scale 
was developed by Coopersmlth (1975) by selecting Items which 
had the highest Item/total correlations on Form A, the lon- 
ger version of the Coopersmlth Inventory. Due to the nature 
of the scale's development, the factor structure was assumed 
to be unldlmenslonal both across groups and within groups. 
The 25 Item scale was administered by an elementary school 

counselor to the student In both the pre and posttest ses- 

n 

slons. O 
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The covarlance matrix was used as Input In testing all 
models under each of the three scale Invariant conditions. 
In addition, one Item was arbitrarily selected as a refer- 
ence loading and Its value was set at 1.0 for all analyses 
In estimating the Lambda, Phi and Theta values. The Lambda 
matrix represented the factor loadings (Item/total regres- 
sions) for each Item' on the one factor scale. The Phi matrix 
represented the true score variance for the scale. The Tho- 
ta matrix represented the Item error variance (uniqueness) 
for each Item. Depending upon the model tested, parameters 
were either set to be Invariant (fixed) and given a value of 
zero or free to-be estimated and given a value of one. 

Ss&la J p variance h£LQ£S. I±m&> Scale Invarlance across 
time can be conceptualized In terms of at least five crite- 
ria: 

1. Are the factor loadings (Item-total regressions) Invar- 
iant across time? This Involves simultaneously testing 
the LISREL Item-total regression coefficients (Lambda 
estimates) from time 1 to time 2. 

2. Are the true score variances Invariant across time? 
Statistically, this would Involve comparing the esti- 
mated true score variance (Phi) for time 1 with the es- 
timated true score variance of time 2. 
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3. |s the Mem error variance (uniqueness) Invariant 
across time? This would Involve a simultaneous test of 
the equality of the Item error variances (Theta vari- 
ance estimates) from flme 1 to time 2. 

4. Are the Item uniquenesses for &a£h l±sm at time 1 cor- 
related with their respective Item un I quesnesses at 
time 2? This Involves simultaneously testing the cor- 
relation of each Item's uniqueness at time 1 with It's 
uniqueness at time 2 (Theta covarlance estimates). 

5. Are the estimated true scores for time 1 and time 2 
correlated? This can be observed by freeing the Item 
unlquesnesses and noting changes In the test-retest re- 
I labl I Ity estimate. 

For'studylng scale Invarlance across time, one group of 
students was arbitrarily selected, the White students. Six 
models were constructed to test the above questions using 
LISREL V. 

a) Model 1 - The factor structure, true score variance and 
error variance from time 1 to time 2 were Invariant. 
( I nvar I ant Mode I ) 

b) Model 2 - The factor structure was free to vary across 
time, but the true score variance and error variance 
from time 1 to time 2 were Invariant. (Lambda free) 
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c) Mo del 3 The true score variance was free to vary 
across time, but the factor structure and error vari- 
ance from time 1 to time 2 were Invariant. (Phi free) 

e ) Mode I 4 - The tot a I amount of I tern uniquenesses for 
each It em was free to vary across time, but the factor 
structure and true score variance from time 1 to time 2 
were Invariant. (Theta variance free) 

d) Mo del 5 - The Individual It em error covarlances were 
free to vary across time, but the factor structure and 
true score variance from time 1 to time 2 were I nvar I - 
ant. (Theta covarlance free) 

f) Model 6 -The factor structure, true score variance and 
Item errors were free to differ from time 1 to time 2. 
(Unrestricted model ) 

■SiLale Inyarlance £r£JJ£J£. Like I nvar I ance across 

time, Invarlance across croups cannot be assessed by a sin- 
gle criteria. Rather, four related questions can be asked 
about Invarlance across groups. For the three groups In the 
present study the questions were: 

1. Are the factor loadings ( I tem-tota I regress I ons ) Invar- 
iant across groups? This Involves simultaneously test- 
ing the LISREL Itemrtotal regression coefficients 
(Lambda estimates) across the three groups. 
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2. Are the estimated true scores Invariant across groups? 
Statistically, this would Involve testing the true 
score variances (Phi) for equality across the three 
groups . 

3. Is the Item uniqueness jrlant across groups? This 
Is a test of the Invarlance of the Item uniquenesses 
(Theta variance estimates) across groups. 

4. Is the Internal consistency of the scale Invariant 
across groups? This would Involve noting the change In 
the Internal consistency (alpha) estimates for each 
group . 

To study scale Invarlance across groups five models were 
constructed using the data from all three ethnic groups 
(Black, White and Hispanic). Models 1, 2, 3, 4 and 6 from 
above were tested across the three groups. 

Bj&sjlLIs ami J215£JJ551jQH 

The chl-square tests of model-data fit are reported In 
Table 1 for the six models tested. The lower the chl-square 
statistic, the better the model fit the original covarlance 
matrix used as Input. All of the chl-square values shown In 
column 1 were statistically s I gn I f I cant at £<. 05 . This 
finding was In part due to 1he large sample size. Bentler 
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and Bonett (1980) have suggested using a chl-square differ- 
ence test (equation 5) between alternate models to determine 
the relative effectiveness of model-data fit. 

Y 2 = Y 2 x 2 and df = dfj - df 2 , (2) 

A l"2 A l " 2 

where X* represented the most restrictive model and 
X\ represented an alternate model with their corres- 

ponding degrees of freedom. If the chl-square difference Is 
statistically significant, than the alternate model repre- 
sents a better fit to the data. 

Chl-square difference tests were conducted to answer the 
first four questions posed In the previous section on test- 
ing scale invarlance across time. For questions 1-4, the 
chl-square difference test was conducted by contrasting Mo- 
de! 1 with Models 2-6. The results are shown In Table 1 co- 
lumn 3. For questions 1,2 and 3 the factor structure, true 
score variance and total l+em error uniquenesses were found 
to be invariant across time since the chf- square difference 
tests were not statistically significant from the Invariant 
model <X*_ 2 = 17.42, df = 24; xj. f - 2.4, df = 1; x\_ h - 
16.81, df = 25 respectively). However the Item error covar- 
lances (question 4) were found to be correlated across time 
( X*_ s - 507 . 14, df = 25, j><.05>. Thus, the Item errors 
were not Independent from time 1 to time 2 and as such, this 
procedure represented a rejection of the classical test 
theory assumption regarding uncorrected errors. In addl- 
tfon, the chl-square difference test between the Invariant 
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model and the unrestricted model was also statistically sig- 
nificant ( X 2 = 627.22, df = 75, £<.05). This finding 

1 — 6 

meant that the model with correlated error was a better fit 
to the data than the Invariant model which represented a 
strict definition of classical test theory, where ^corre- 
lated errors were assumed. For this set of data then, the 
assumption of uncorrected errors across testings did not 
appear tenable and It was basically this difference that re- 
sulted In the unrestricted model being a better fit to the 
data than the Invariant model. 

Finally, to emphasize the Improvement In model-data rlt 
of Models 5 and 6 over Model 1, the delta Index (Bentler & 
Bonett, 1980) was calculated and Is shown In Table 1 column 
4. Delta represents an Incremental Index of fit that Is In- 
dependent of sample size and Is calculated as 

where X 2 Is thought to represent the most restrictive model 
and X 2 Is an alternative model. The values of delta 

2 

range between zero and one. The results parallel that of 
the chl-square difference test, where Models 5 and 6 repre- 
sent a better fit of the original covarlance matrix than Mo- 
del 1 (.209 and .258, respectively). 

Question 5, regarding the possible bias In test-retest 
reliability due to correlated errors, was tested by noting 
the difference In the phi matrix from Model 1 to Models 5 
and 6. The off-diagonal of the phi matrix gives the amount 

12 
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of covarlance between the true variance of time 1 and lime 
2. This value, adjusted by the standard deviations of the 
true variances for time 1 and time 2, represents the corre- 
lation or test-retest rellabllty. Under the condition of 
complete Invarlance (Model 1), the test-retest coefficient 
was .630. When the errors were allowed to be correlated 
(Models 5 and 6) the test-retest coeff Iclent was .600 and 
.570 respectively. Therefore, when measurement errors are 
correlated between testings, the test-retest rellabllty may 
be over or underestimated. For this set of data, the over- 
estimate was very slight however, It may not be so with oth- 
er data. Thus, psychometr I c I ans can test for correlated er- 
rors and adjust for them, If need be, by using LISREL 
procedures . 

Scale ±jw.aLlaJi£& 

The chl-square statistic for model-data fit Is reported 
In Table 1 column 1 for the five models tested. All chl- 
square values were statistically significant at 4i<.05. To 
answer the first four questions posed for scale Invarlance 
across groups, the chl-square difference test was run com- 
paring Model 1 to Models 2-5. The results are shown In Ta- 
ble 1 co I umn 3. 

For Questions 1 and 3, the factor structure and Item 
uniquenesses were found to be Invariant across the three 
ethnic groups ( x >_ 2 = 64.57, df = 48; x\ = 42.35, df- 50, 
jl>.05, respectively). The delta Index of Incremental fit 
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for Models 2 and 4 was also very small (.041 and .027, re- 
spectively). Regarding Question 2, the true score variance 
was found to differ significantly across the three ethnic 
groups <X*_ 3 - 9.1. df - 2, *<.05>. However, the delta 
index of fit indicated that this difference was not practi- 
cal I y s I gn I f I cant (.006) and 1 1 I ustr ated how large sample 
sizes can highlight trivial differences by using only the 
chl-square test or the chl-square difference test. Also, 
the unrestricted model was not superior to the Invariant mo- 
de, using the chl-square difference test due to the large 
difference degrees of freedom C X*'_ , =111-04, df =100, 
A>.03>. Although slight, the delta Index was greatest for 
the unrestricted, mode. (.071) again, Indicating no practical 
significance from the strict Invariant model. Thus, the 
factor structure, the amount of true score variance and the 
,tem unlquesness were Invariant across the groups. This 
procedure allows one to test the difference, If any, In the ■ 
construct being measured for each group. For this set of 
. data, the construct being measured was shown to be Invariant 
across groups and represented a test of factorial stability. 

For question 4, regarding the Invarlance of the scale's 
,nterna. consistency across groups, the alpha reliability 
coefficient for time 1 for the White group was .78, for the 
Black group .70 and for the Hlspanlcs .60. Overall all 
groups, the reliability was .74 for time 1. For time 2, the 
rellabllty coefficient for the White group was .81, for the 
Black group .74 and for the Hlspanlcs .74. The alpha re.la- 
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blity for all groups was .79 for time 2. A s I I ght I ncrease 
from time 1 to time 2 was noted for the White and Black 
groups and a rather large Increase was observed for the His- 
panic group. Since It was shown earlier that the Item uni- 
quenesses were correlated for the White group from time 1 to 
time 2, It may be that the Item uniquenesses within each 
group may likewise be correlated. A test for correlated er- 
rors has been reported by Maxwell (1968) and could be used 
to determined If the differences noted In the Internal con- 
slstency estimates above were true differences or were due 
to correlated errors within each group v;hich could produced 
an over or under estimate of the Internal consistency at time 
2. 

QQRQ I U S I.QJ15. 

An approach to te sting scale Invarlance across time and 
groups was demonstrated using LIS RE L V. The Importance of 
testing for scale Invarlance across time Is that the con- 
struct being measured may vary from time 1 to time 2 In 
terms of It f s factor structure, true score and error vari- 
ance as well as the accurracy or the stability of the mea- 
surement. If the the construct being measured changes from 
time 1 to time 2, then problems In Interpretation of the 
construct will occur. If the Item uniquenesses are corre- 
lated from time 1 to time 2, then the test-retest reliabili- 
ty coeflclent will be biased. As was shown In the present 
study, the factor structure and amount of true score and er- 

1 r 
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ror variance did not change across time, but the Item uni- 
quenesses were correlated from time 1 to time 2 and resulted 
in an overestimate of the test-retest coefficient. 

Secondly, It was demonstrated that one can test for sca- 
le invarlance across groups. This Is a test of the stabili- 
ty of the construct being measured for the groups Involved. 
If the construct varies across Independent groups then the 
confidence one would have In the Interpretation made of the 
observed score would not be very strong. Testing for scale 
Invarlance across groups using LISREL provides a way to con- 
firm or dlsconflrm the similarity of the construct being 
measured for each group. For the present study, -the unlde- 
mlnslonal construct of self- concept was Invariant across 
the three ethnic groups studied, although the degree of Item 
homogeneity within groups differed.. The difference In Item 
homogeneity may be attributed to correlated errors within 
each group and could be tested using an analysis of variance 
model proposed by Maxwell (1968). LISREL procedures are po- 
. tentlally very useful as they allow for the testing of cor- 
related errors and their effect on scale Invarlance across 
time and the testing of scale Invarlance across groups. 
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Table 1 

Goodness of Fit Indices for Models Tested 



SMr^JLars A± X 2 diff £j2l±a 
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1 . 


Al I Invar I ant 


2427 


.47 


1 224 








2. 


Lambda free 


241 0 


.05 


1 200 


1 7 


.42/24 


. 007 


3. 


Phi free 


2425 


.07 


1 223 


2 


.40/1 


.000 


4. 


Theta free (Variance) 


241 0 


.66 


1 1 99 


16 


.81/25 


.007 


5. 


Theta free (Covarlance) 


1 920 


.33 


• 1 199 


507 


. 1 4/25* 


.209 


6. 


All free 


1 800 


.25 


1149 


627 


.22/75* 


.258 


LcaJLe Inxazlanzsi Across .Gjiq_u£.£ 














1 . 


A I I Invar 1 ant 


1558 


.74 


925 








2. 


Lambda free 


1 494 


.17 


877 


64 


.57/48 


.041 


3. 


Phi free 


1 549 


.60 


923 


9 


.14/2 * 


.006 


4. 


Theta free (Variance) 


1516 


.39 


87 5 


42 


.35/50 


.027 


5. 


All free 


1 447 


.70 


825 


1 1 1 


.04/100 


.071 



*|2<.05 
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